Mining millions of metaphors
Identifieur interne : 000357 ( Main/Exploration ); précédent : 000356; suivant : 000358Mining millions of metaphors
Auteurs : Brad Pasanek [États-Unis] ; D. Sculley [États-Unis]Source :
- Literary and Linguistic Computing [ 0268-1145 ] ; 2008-09.
Abstract
One of the first decisions made in any research concerns the selection of an appropriate scale of analysis—are we looking out into the heavens, or down into atoms? To conceive a digital library as a collection of a million books may restrict analysis to only one level of granularity. In this article, we examine the consequences and opportunities resulting from a shift in scale, where the desired unit of interpretation is something smaller than a text: it is a keyword, a motif, or a metaphor. A million books distilled into a billion meaningful components become raw material for a history of language, literature, and thought that has never before been possible. While books herded into genres and organized by period remain irregular, idiosyncratic, and meaningful in only the most shifting and context-dependent ways, keywords or metaphors are lowest common denominators. At the semantic level—the level of words, images, and metaphors—long-term regularity and patterns emerge in collection, analysis, and taxonomy. This article follows the foregoing course of thought through three stages: first, the manual curation of a high quality database of metaphors; second, the expansion of this database through automated and human-assisted techniques; finally, the description of future experiments and opportunities for the application of machine learning, data mining, and natural language processing techniques to help find patterns and meaning concealed at this important level of granularity.
Url:
DOI: 10.1093/llc/fqn010
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 003881
- to stream Istex, to step Curation: 002C57
- to stream Istex, to step Checkpoint: 000314
- to stream Main, to step Merge: 000359
- to stream Main, to step Curation: 000357
Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title>Mining millions of metaphors</title>
<author wicri:is="90%"><name sortKey="Pasanek, Brad" sort="Pasanek, Brad" uniqKey="Pasanek B" first="Brad" last="Pasanek">Brad Pasanek</name>
</author>
<author wicri:is="90%"><name sortKey="Sculley, D" sort="Sculley, D" uniqKey="Sculley D" first="D." last="Sculley">D. Sculley</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:DB439E09F0F5EA1C7B2555FC52451FEDD4A07E3E</idno>
<date when="2008" year="2008">2008</date>
<idno type="doi">10.1093/llc/fqn010</idno>
<idno type="url">https://api.istex.fr/ark:/67375/HXZ-5HM0XZ7M-R/fulltext.pdf</idno>
<idno type="wicri:Area/Istex/Corpus">003881</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Corpus" wicri:corpus="ISTEX">003881</idno>
<idno type="wicri:Area/Istex/Curation">002C57</idno>
<idno type="wicri:Area/Istex/Checkpoint">000314</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Checkpoint">000314</idno>
<idno type="wicri:doubleKey">0268-1145:2008:Pasanek B:mining:millions:of</idno>
<idno type="wicri:Area/Main/Merge">000359</idno>
<idno type="wicri:Area/Main/Curation">000357</idno>
<idno type="wicri:Area/Main/Exploration">000357</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a">Mining millions of metaphors</title>
<author wicri:is="90%"><name sortKey="Pasanek, Brad" sort="Pasanek, Brad" uniqKey="Pasanek B" first="Brad" last="Pasanek">Brad Pasanek</name>
<affiliation wicri:level="1"><country xml:lang="fr">États-Unis</country>
<wicri:regionArea>University of Virginia, Charlottesville</wicri:regionArea>
<wicri:noRegion>Charlottesville</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science, Tufts University, Medford</wicri:regionArea>
<wicri:noRegion>Medford</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">États-Unis</country>
</affiliation>
</author>
<author wicri:is="90%"><name sortKey="Sculley, D" sort="Sculley, D" uniqKey="Sculley D" first="D." last="Sculley">D. Sculley</name>
<affiliation wicri:level="1"><country xml:lang="fr">États-Unis</country>
<wicri:regionArea>University of Virginia, Charlottesville</wicri:regionArea>
<wicri:noRegion>Charlottesville</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science, Tufts University, Medford</wicri:regionArea>
<wicri:noRegion>Medford</wicri:noRegion>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="j">Literary and Linguistic Computing</title>
<idno type="ISSN">0268-1145</idno>
<idno type="eISSN">1477-4615</idno>
<imprint><publisher>Oxford University Press</publisher>
<date type="published" when="2008-09">2008-09</date>
<biblScope unit="volume">23</biblScope>
<biblScope unit="issue">3</biblScope>
<biblScope unit="page" from="345">345</biblScope>
<biblScope unit="page" to="360">360</biblScope>
</imprint>
<idno type="ISSN">0268-1145</idno>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0268-1145</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract">One of the first decisions made in any research concerns the selection of an appropriate scale of analysis—are we looking out into the heavens, or down into atoms? To conceive a digital library as a collection of a million books may restrict analysis to only one level of granularity. In this article, we examine the consequences and opportunities resulting from a shift in scale, where the desired unit of interpretation is something smaller than a text: it is a keyword, a motif, or a metaphor. A million books distilled into a billion meaningful components become raw material for a history of language, literature, and thought that has never before been possible. While books herded into genres and organized by period remain irregular, idiosyncratic, and meaningful in only the most shifting and context-dependent ways, keywords or metaphors are lowest common denominators. At the semantic level—the level of words, images, and metaphors—long-term regularity and patterns emerge in collection, analysis, and taxonomy. This article follows the foregoing course of thought through three stages: first, the manual curation of a high quality database of metaphors; second, the expansion of this database through automated and human-assisted techniques; finally, the description of future experiments and opportunities for the application of machine learning, data mining, and natural language processing techniques to help find patterns and meaning concealed at this important level of granularity.</div>
</front>
</TEI>
<affiliations><list><country><li>États-Unis</li>
</country>
</list>
<tree><country name="États-Unis"><noRegion><name sortKey="Pasanek, Brad" sort="Pasanek, Brad" uniqKey="Pasanek B" first="Brad" last="Pasanek">Brad Pasanek</name>
</noRegion>
<name sortKey="Pasanek, Brad" sort="Pasanek, Brad" uniqKey="Pasanek B" first="Brad" last="Pasanek">Brad Pasanek</name>
<name sortKey="Pasanek, Brad" sort="Pasanek, Brad" uniqKey="Pasanek B" first="Brad" last="Pasanek">Brad Pasanek</name>
<name sortKey="Sculley, D" sort="Sculley, D" uniqKey="Sculley D" first="D." last="Sculley">D. Sculley</name>
<name sortKey="Sculley, D" sort="Sculley, D" uniqKey="Sculley D" first="D." last="Sculley">D. Sculley</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Wicri/Informatique/explor/SgmlV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000357 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000357 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Wicri/Informatique |area= SgmlV1 |flux= Main |étape= Exploration |type= RBID |clé= ISTEX:DB439E09F0F5EA1C7B2555FC52451FEDD4A07E3E |texte= Mining millions of metaphors }}
This area was generated with Dilib version V0.6.33. |